fix: file size mismatch when download interrupted by sEpt0r · Pull Request #176 · rtuszik/photon-docker

sEpt0r · 2025-10-30T22:27:45Z

Problem

When downloading large files (e.g., a 117GB photon-db), connection failures cause the entire download to restart from scratch instead of resuming, because the download state is saved only every 1MB.

Why it happens:

When the connection drops mid-download, the file contains more data than the last saved state, and on retry, the exact size check fails.

Download state is saved every 1MB (save_interval = 1024 * 1024)
When connection drops mid-download, the file contains more data than the last saved state
On retry, the exact size check fails
Script logs "File size mismatch, starting fresh download" and deletes state
Download restarts from 0 bytes instead of resuming

Fix

This fix ensures that progress is saved even if data written between periodic saves (every 1MB), preventing loss of up to 1MB of progress.

Additionally, increases connection timeout (30s) and read timeout (60s), and adds a backoff timeout to make downloads more reliable on slow or unstable networks.

Test

I built a new container and ran tests in my slowest setup.
After a network interruption, the Python script continued downloading the file:

2025-10-30 22:09:53,250 - root - INFO - Download progress: 6.8% (7.93GB / 116.98GB) - 47.3 Mbps - ETA: 5h 30m
2025-10-30 22:10:03,250 - root - INFO - Download progress: 6.8% (7.98GB / 116.98GB) - 48.2 Mbps - ETA: 5h 23m
2025-10-30 22:10:13,251 - root - INFO - Download progress: 6.9% (8.04GB / 116.98GB) - 47.5 Mbps - ETA: 5h 28m
2025-10-30 22:11:16,951 - root - WARNING - Download attempt 1 failed: HTTPSConnectionPool(host='r2.koalasec.org', port=443): Read timed out.
2025-10-30 22:11:16,954 - root - INFO - Waiting 1s before retry...
2025-10-30 22:11:17,956 - root - INFO - Retrying download (attempt 2/3)...
2025-10-30 22:11:17,960 - root - INFO - Resuming download: file size 8654962688 bytes (saved state: 8654962688 bytes)
2025-10-30 22:11:17,961 - root - INFO - Resuming download from byte 8654962688
2025-10-30 22:11:18,851 - root - INFO - Starting download of 116.98GB to photon-db-latest.tar.bz2
2025-10-30 22:11:28,854 - root - INFO - Download progress: 6.9% (8.11GB / 116.98GB) - 38.5 Mbps - ETA: 6h 44m
2025-10-30 22:11:38,856 - root - INFO - Download progress: 7.0% (8.16GB / 116.98GB) - 47.0 Mbps - ETA: 5h 31m
2025-10-30 22:11:48,859 - root - INFO - Download progress: 7.0% (8.22GB / 116.98GB) - 47.2 Mbps - ETA: 5h 29m

It may still be worth increasing the read timeout further (e.g., from timeout=(30, 60) to timeout=(30, 300)), but at least the download now continues instead of restarting from the beginning.

Fixes #169

chore: replace workflow repo tags by variables

When downloading large files (e.g., 117GB photon-db), connection failures cause the entire download to restart from scratch instead of resuming, because download state is saved every 1MB. When connection drops mid-download, the file has MORE data than the last saved state and on retry, the exact size check fails. This fix ensures progress is saved even if data written between periodic saves (every 1MB), preventing loss of up to 1MB of progress. Also, increased connection timeout (30s) and read timeout (60s), and added backoff timeout to be more reliable in the slow and unstable networs.

coderabbitai · 2025-10-30T22:27:54Z

📝 Walkthrough

Walkthrough

Updates to the downloader module introduce more robust download handling with improved resume logic, error handling with state persistence, extended HTTP timeouts with differentiated connect and read settings, and exponential backoff retry logic on request failures.

Changes

Cohort / File(s)	Change Summary
Download robustness enhancements `src/downloader.py`	Enhanced resume logic in `load_download_state` to treat resumed downloads as valid when actual file size matches or exceeds saved size, updating state accordingly; wrapped `_download_content` loop with try/except to persist state on both completion and exception before re-raising errors; configured `_perform_download` HTTP timeout from single value to tuple (30, 60) seconds for separate connect and read timeouts; implemented exponential backoff (1s, 2s, 4s) retry policy in `download_file` on `RequestException` with logging at each attempt; ensured `save_download_state` is called after successful downloads.

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "fix: file size mismatch when download interrupted" is concise and specific, clearly summarising the primary change which directly addresses the main issue identified in the pull request. It accurately reflects that the PR tackles the core problem where download interruptions caused incorrect file size checks, leading to full restart rather than resumption. The title provides enough clarity for a developer scanning the history to understand the key improvement without being vague or overly broad.
Linked Issues Check	✅ Passed	The code changes directly address all primary objectives from issue #169. The resume logic enhancement addresses the file size mismatch problem by comparing actual file size against saved state rather than requiring exact equality, preventing unnecessary fresh downloads. The timeout extension from 20 seconds to a (30, 60) tuple addresses the read timeout failures on mirrors. The exponential backoff implementation (1s, 2s, 4s delays) and improved retry flow tackle the challenge of exceeding maximum attempts. State persistence after each download and on exceptions ensures progress is not lost between saves, collectively enabling reliable download and extract of large files as requested in the linked issue.
Out of Scope Changes Check	✅ Passed	All changes in the pull request are directly scoped to the stated objectives from issue #169. The modifications to the downloader logic—including resume logic, error handling, state persistence, timeout configuration, and retry backoff—are all specifically aimed at fixing the download interruption issues and file size mismatch problems. The raw summary confirms no alterations to exported or public entities and no unrelated changes have been introduced. Every change serves a clear purpose in improving download reliability on slow or unstable networks.
Description Check	✅ Passed	The pull request description is well-structured and directly related to the changeset, detailing the problem (file size mismatch causing restart), the root cause (state saved only every 1MB), and the solution (improved state persistence, extended timeouts, and exponential backoff retry logic). The description also includes concrete test evidence from actual testing on a slow setup, demonstrating that the fix works in practice with log excerpts showing successful resumption after network interruption. This level of detail and relevance clearly satisfies the lenient pass criterion for this check.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between fd434ee and 39771e8.

📒 Files selected for processing (1)

src/downloader.py (4 hunks)

🔇 Additional comments (4)

src/downloader.py (4)

116-123: Graceful resume when the file is ahead of the saved state

Allowing the resume path whenever the on-disk file has at least as many bytes as the recorded state keeps the safety net against truncation while eliminating the old restart penalty. The extra logging makes the decision path clear in logs too.

397-432: Solid guard that always persists progress

Persisting the state after every loop—in both the clean exit path and the exception path—seals the window where we could lose up to 1 MB. This is exactly what was biting the large downloads.

459-459: Timeout split matches real-world mirror latency

Moving to (30, 60) keeps the connect phase snappy while letting slow mirrors stream without tripping the read timeout. Sensible choice.

514-516: Back-off is a good fit for flaky networks

The exponential wait before retries gives mirrors a breather without making the user wait forever. Nice incremental resilience win.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

rtuszik

LGTM

rtuszik · 2025-10-31T17:17:56Z

Thanks for the fix!

sEpt0r added 5 commits October 30, 2025 11:22

chore: replace workflow repo tags by variables

ec9d61f

fix: use repository_owner variable to make repo name lowercase

6da3228

Merge pull request #2 from sEpt0r/update-workflow

d9d20ab

chore: replace workflow repo tags by variables

chore: restore origin workflow to prepare PR

39771e8

sEpt0r mentioned this pull request Oct 30, 2025

[BUG] Unable to download planet, max. 3 download attempts, read timed outs and file size mismatches #169

Closed

rtuszik approved these changes Oct 31, 2025

View reviewed changes

rtuszik changed the base branch from main to dev October 31, 2025 17:18

rtuszik changed the base branch from dev to main October 31, 2025 17:35

rtuszik merged commit 8bd31a5 into rtuszik:main Oct 31, 2025
9 checks passed

dgraf-gh mentioned this pull request Jan 8, 2026

Photon failing to download dedicatedcode/reitti#657

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: file size mismatch when download interrupted#176

fix: file size mismatch when download interrupted#176
rtuszik merged 5 commits intortuszik:mainfrom
sEpt0r:fix-downloader

sEpt0r commented Oct 30, 2025

Uh oh!

coderabbitai bot commented Oct 30, 2025 •

edited

Loading

Walkthrough

Changes

Uh oh!

rtuszik left a comment

Uh oh!

rtuszik commented Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sEpt0r commented Oct 30, 2025

Problem

Why it happens:

Fix

Test

Uh oh!

coderabbitai bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Pre-merge checks

Uh oh!

rtuszik left a comment

Choose a reason for hiding this comment

Uh oh!

rtuszik commented Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Oct 30, 2025 •

edited

Loading